43 research outputs found

    Multi-Softcore Architecture on FPGA

    Get PDF
    To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication)

    Multi-Softcore Architecture on FPGA

    Get PDF
    To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication)

    Mppsocgen: A framework for automatic generation of mppsoc architecture

    Full text link
    Automatic code generation is a standard method in software engineering since it improves the code consistency and reduces the overall development time. In this context, this paper presents a design flow for automatic VHDL code generation of mppSoC (massively parallel processing System-on-Chip) configuration. Indeed, depending on the application requirements, a framework of Netbeans Platform Software Tool named MppSoCGEN was developed in order to accelerate the design process of complex mppSoC. Starting from an architecture parameters design, VHDL code will be automatically generated using parsing method. Configuration rules are proposed to have a correct and valid VHDL syntax configuration. Finally, an automatic generation of Processor Elements and network topologies models of mppSoC architecture will be done for Stratix II device family. Our framework improves its flexibility on Netbeans 5.5 version and centrino duo Core 2GHz with 22 Kbytes and 3 seconds average runtime. Experimental results for reduction algorithm validate our MppSoCGEN design flow and demonstrate the efficiency of generated architectures.Comment: 16 pages; International Journal of Computer Science & Information Technology (IJCSIT) Vol 4, No 2, April 201

    Broadcast with mask on a Massively Parallel Processing on a Chip

    Get PDF
    workshop drnoc2012The delay of instructions broadcast has a significant impact on the performance of Single Instruction Multiple Data (SIMD) architecture. This is especially true for massively parallel processing Systems-on-Chip (mppSoC), where the processing stage and that of setting up the communication mechanism need several clock periods. Subnetting is the strategy used to partition a single physical network into more than one smaller logical sub-networks (subnets). This technique better controls the broadcast instructions domain and the data traffic between network nodes. Furthermore, it allows to separate synchronous communications from asynchronous processing which maintains reliable communications and rapid processing through parallel processors. This paper describes the design of a communication model called broadcast with mask. This model is dedicated to mppSoC architecture with a huge number of processor elements because it maintains performances even when the number of processors increases. Simulation results and an FPGA implementation validate our approach

    Analyse tribologique du rôle de constituants dans les performances de matériaux composites organiques pour garnitures de frein

    Get PDF
    Le rôle des constituants au sein des matériaux de friction pour garniture de frein est encore mal connu du fait, d une part de la complexité des formulations de ces matériaux composites organiques, d autre part de la sollicitation induite par le freinage qui engendre de multiples phénomènes physiques. Il s avère donc particulièrement difficile d établir les liens entre la composition d une garniture et ses performances. Les travaux ont concerné une garniture de frein industrielle pour véhicule poids lourd. Une étude préalable a été réalisée du point de vue de sa microstructure et de ses constituants, de ses propriétés, de son élaboration et de ses performances en freinage. La démarche expérimentale adoptée a reposé sur le développement i) d une formulation modèle , déduite de la formulation industrielle par simplification de la composition, ii) d un essai d usure représentatif de sollicitations de freinage, pour aborder le fonctionnement tribologique des constituants dans une configuration simplifiée. Deux constituants ont été plus particulièrement étudiés, des particules de laiton et des fibres de verre introduites dans la formulation modèle prise en référence. L analyse des performances a porté sur la compréhension des liens entre les mécanismes de frottement et d usure, la microstructure et les propriétés des matériaux et leur comportement tribologiqueThe role of the constituents within friction materials for brake lining still largely unknown due to the complexity of these organic composite formulations and to the induced solicitation by braking which generates multiple physical phenomena. Therefore, it is especially difficult to establish the relationship between the composition of a brake lining and its performance. The work has concerned an industrial brake lining for heavy vehicle. A preliminary study was conducted in terms of its microstructure, constituents, properties, elaboration and its braking performance. The experimental approach was based, on the one hand, on the development of a model formulation, derived from the simplification of the industrial one and, on the other hand, on a specific wear test representative of braking solicitations, to take up the tribological behaviour of constituents in the framework of a simplified configuration. Brass particles and glass fibres have been particularly studied, introduced into the model formulation taken as a reference. The analysis of performance has focused on understanding the relationship between friction and wear mechanisms, microstructure and properties of materials, and their tribological behaviourVILLENEUVE D'ASCQ-ECLI (590092307) / SudocSudocFranceF

    IP based configurable SIMD massively parallel SoC

    Get PDF
    International audienceSignificant advances in the field of configurable computing have enabled parallel processing within a single Field- Programmable Gate Array (FPGA) chip. This paper presents the implementation of a flexible and programmable Single Instruc- tion Multiple Data (SIMD) processing system on FPGA that can be adapted to the application. Its implementation is based on an IP (Intellectual Property) assembling approach making its design fast and easy. A generation tool is also developed to generate the SIMD configuration depending on the application requirements. The proposed parallel processing system on chip is portable, scalable and flexible since it can be customized to match the needs of a data parallel application. Based on FPGA, different SIMD configurations have been evaluated in terms of performance and area trade-offs. The proposed parametric system shows good results executing some signal processing applications such as parallel matrices multiplication, FIR filter and RGB to YIQ image color conversion

    Defensive Approximation: Securing CNNs using Approximate Computing

    Full text link
    In the past few years, an increasing number of machine-learning and deep learning structures, such as Convolutional Neural Networks (CNNs), have been applied to solving a wide range of real-life problems. However, these architectures are vulnerable to adversarial attacks. In this paper, we propose for the first time to use hardware-supported approximate computing to improve the robustness of machine learning classifiers. We show that our approximate computing implementation achieves robustness across a wide range of attack scenarios. Specifically, for black-box and grey-box attack scenarios, we show that successful adversarial attacks against the exact classifier have poor transferability to the approximate implementation. Surprisingly, the robustness advantages also apply to white-box attacks where the attacker has access to the internal implementation of the approximate classifier. We explain some of the possible reasons for this robustness through analysis of the internal operation of the approximate implementation. Furthermore, our approximate computing model maintains the same level in terms of classification accuracy, does not require retraining, and reduces resource utilization and energy consumption of the CNN. We conducted extensive experiments on a set of strong adversarial attacks; We empirically show that the proposed implementation increases the robustness of a LeNet-5 and an Alexnet CNNs by up to 99% and 87%, respectively for strong grey-box adversarial attacks along with up to 67% saving in energy consumption due to the simpler nature of the approximate logic. We also show that a white-box attack requires a remarkably higher noise budget to fool the approximate classifier, causing an average of 4db degradation of the PSNR of the input image relative to the images that succeed in fooling the exact classifierComment: ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2021

    Defending with Errors: Approximate Computing for Robustness of Deep Neural Networks

    Full text link
    Machine-learning architectures, such as Convolutional Neural Networks (CNNs) are vulnerable to adversarial attacks: inputs crafted carefully to force the system output to a wrong label. Since machine-learning is being deployed in safety-critical and security-sensitive domains, such attacks may have catastrophic security and safety consequences. In this paper, we propose for the first time to use hardware-supported approximate computing to improve the robustness of machine-learning classifiers. We show that successful adversarial attacks against the exact classifier have poor transferability to the approximate implementation. Surprisingly, the robustness advantages also apply to white-box attacks where the attacker has unrestricted access to the approximate classifier implementation: in this case, we show that substantially higher levels of adversarial noise are needed to produce adversarial examples. Furthermore, our approximate computing model maintains the same level in terms of classification accuracy, does not require retraining, and reduces resource utilization and energy consumption of the CNN. We conducted extensive experiments on a set of strong adversarial attacks; We empirically show that the proposed implementation increases the robustness of a LeNet-5, Alexnet and VGG-11 CNNs considerably with up to 50% by-product saving in energy consumption due to the simpler nature of the approximate logic.Comment: arXiv admin note: substantial text overlap with arXiv:2006.0770

    G-MPSoC: Generic Massively Parallel Architecture on FPGA

    Get PDF
    International audienceNowadays, recent intensive signal processing applications are evolving and are characterized by the diversity of algorithms (filtering, correlation, etc.) and their numerous parameters. Having a flexible and pro-grammable system that adapts to changing and various characteristics of these applications reduces the design cost. In this context, we propose in this paper Generic Massively Parallel architecture (G-MPSoC). G-MPSoC is a System-on-Chip based on a grid of clusters of Hardware and Software Computation Elements with different size, performance, and complexity. It is composed of parametric IP-reused modules: processor, controller, accelerator, memory, interconnection network, etc. to build different architecture configurations. The generic structure of G-MPSoC facilitates its adaptation to the intensive signal processing applications requirements. This paper presents G-MPSoC architecture and details its different components. The FPGA-based implementation and the experimental results validate the architectural model choice and show the effectiveness of this design

    Master-Slave Control structure for massively parallel System on Chip

    Get PDF
    16th Euromicro Conference on Digital System DesignInternational audienceThe performance of massively parallel processing system depends mostly on the control configuration that is inherently part of the system. In particular, centralized control configuration is rigid and limits system scalability, and distributed control configuration is difficult to control in processing elements (PEs) interaction. Maintaining a flexible autonomous computation coupled with regular synchronous communication can assure a efficient parallel processing. The master-slave control structure is specified in such a way that previous features of the massively parallel System-on-Chip (mpSoC) are preserved and performance is improved. In this paper, we define the prototyping of a master-slave control structure for mpSoC in a FPGA-based platform. The structure implementation and related experiments using the vhdl language running on virtex6 ml605 of Xilinx board are described
    corecore